-
Notifications
You must be signed in to change notification settings - Fork 190
Run fixpoint per strongly connected component #999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
CodSpeed Performance ReportMerging #999 will degrade performances by 14.71%Comparing Summary
Benchmarks breakdown
|
Hmm, I'm surprised that this is even slower than #995, considering that most logic now lives directly in Either way. I think there are some optimization opportunities:
|
a2b3da6
to
058fa5b
Compare
// If the value is from the same revision but is still provisional, consider it changed | ||
// because we're now in a new iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always found the deep_verify_memo
logs for provisional values confusing. Exiting early here ensures they don't show up
ab90dd2
to
dbb3eb0
Compare
The amortized benchmark results make no sense to me... I didn't touch |
Fix serde attribute
3ef0cbe
to
81137dd
Compare
06826c6
to
d6f0f20
Compare
I found one more bug. We hit I'm not entirely sure yet what's happening |
@MichaReiser Should I review this now, or better wait for further exploration of this new bug? |
It's ready. I very much hope that any bugs that I discover now won't require a significant rewrite. The most recent bug related to panicking cycles required a 3-line fix only. |
e5a0069
to
c5cc3dd
Compare
let mut opt_last_provisional: Option<&Memo<'db, C>> = None; | ||
// This is different from `opt_old_memo` which might be from a different revision. | ||
let mut last_provisional_memo: Option<&Memo<'db, C>> = None; | ||
// TODO: Can we seed those somehow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is #980
// This isn't strictly necessary, but if this is a provisional memo for an inner cycle, | ||
// await all outer cycle heads to give the thread driving it a chance to complete | ||
// (we don't want multiple threads competing for the queries participating in the same cycle). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required for fallback immediate in multi threaded context. I didn't spend any time understanding why
// (we can't rely on `iteration_count` being updated for nested cycles because the nested cycles may have completed successfully). | ||
// b) It's guaranteed that this query will panic again anyway. | ||
// That's why we simply propagate the panic here. It simplifies our lives and it also avoids duplicate panic messages. | ||
if old_memo.value.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If value.is_none()
always means "panicked" (which is what it looks like is implied here), would it make sense to reflect that more clearly in method naming, e.g. add a memo.panicked()
or similar?
// t1: a (completes `b` with `c` in heads) | ||
// | ||
// Note how `a` only depends on `c` but not `a`. This is because `a` only saw the initial value of `c` and wasn't updated when `c` completed. | ||
// That's why we need to resolve the cycle heads recursively to `cycle_heads` contains all cycle heads at the moment this query completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// That's why we need to resolve the cycle heads recursively to `cycle_heads` contains all cycle heads at the moment this query completed. | |
// That's why we need to resolve the cycle heads recursively so `cycle_heads` contains all cycle heads at the moment this query completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review everything too closely, but I left a couple nits and questions.
// b) It's guaranteed that this query will panic again anyway. | ||
// That's why we simply propagate the panic here. It simplifies our lives and it also avoids duplicate panic messages. | ||
if old_memo.value.is_none() { | ||
::tracing::warn!("Propagating panic for cycle head that panicked in an earlier execution in that revision"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're using the tracing
crate directly instead of our wrapper (and in other places as well)? Is it because this is already a cold path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to add warn
to our own tracing but Rust was mad at me for warn
being a reserved keyword or something. I then decided not to bother because we only use warn
in very cold paths.
|
||
pub(crate) struct SyncState { | ||
id: ThreadId, | ||
/// The thread id that is owning this query (actively executing it or iterating it as part of a larger cycle). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The thread id that is owning this query (actively executing it or iterating it as part of a larger cycle). | |
/// The thread id that currently owns this query (actively executing it or iterating it as part of a larger cycle). |
} | ||
|
||
#[derive(Copy, Clone, Debug)] | ||
pub enum SyncOwnerId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe just call this SyncOwner
, since it doesn't necessarily hold an ID.
|
||
/// Signalled whenever a query with dependents completes. | ||
/// Allows those dependents to check if they are ready to unblock. | ||
// condvar: unsafe<'stack_frame> Pin<&'stack_frame Condvar>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this was mistakenly removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't clear to me what it documents? Do you think this should be part of the field comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it would be fine as a field comment.
pub(super) fn mark_as_transfer_target(&self, key_index: Id) -> Option<SyncOwnerId> { | ||
let mut syncs = self.syncs.lock(); | ||
syncs.get_mut(&key_index).map(|state| { | ||
state.anyone_waiting = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we unconditionally set anyone_waiting
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment. It's an optimization so that ClaimGuard::release
can always early exit if anyone_waiting
is false without having to check is_transfer_target
too.
/// Whether this query has been claimed by the query that currently owns it. | ||
/// | ||
/// If `a` has been transferred to `b` and the stack for t1 is `b -> a`, then `a` can be claimed | ||
/// and `claimed_transferred` is set to `true`. However, t2 won't be able to claim `a` because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// and `claimed_transferred` is set to `true`. However, t2 won't be able to claim `a` because | |
/// and `claimed_twice` is set to `true`. However, t2 won't be able to claim `a` because |
.iter() | ||
.find(|active_query| { | ||
cycle_heads.contains(&active_query.database_key_index) | ||
&& active_query.database_key_index != current_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this enough to identify strongly connected components? Do we never need to prefer the outermost query for deeply nested cycles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding the outer-most query is somewhat difficult when it stretches across multiple threads. At some point, I had an implementation that queried the DependencyGraph
to find the outermost cycle. However, it turned out that finding the outermost cycle was fairly expensive. Just finding "an outer" query is cheaper, and we can leave it to "that outer" query to find the "next outer query". Ultimately, we'll reach the outermost query.
Things are much easier if all queries are on the stack, which is what we handle here. We start from the outermost active query and try to find the first query that is a cycle head. This gives us a "local maximum" for what the outermost query is.
let memo_iteration_count = old_memo.revisions.iteration(); | ||
|
||
// The `DependencyGraph` locking propagates panics when another thread is blocked on a panicking query. | ||
// However, the locking doesn't handle the case where a thread fetches the result of a panicking cycle head query **after** all locks were released. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// However, the locking doesn't handle the case where a thread fetches the result of a panicking cycle head query **after** all locks were released. | |
// However, the locking doesn't handle the case where a thread fetches the result of a panicking cycle head query **after** all locks were released. |
This PR rewrites how Salsa runs fixpoint iterations when multiple cylce heads participate in a cycle (when there are multiple nested cycles).
Motivation
The main motivation for this is that we see a runaway in ty for code like this:
Or as a graph
Today, we'll run separate fixpoint iterations for
y1 <-> y2
,y3 <-> y1
, andz1 <-> z2
.The downside of this is that Salsa runs the nested fixpoint iterations
y1<->y3
to convergence for everyy1 <-> y2
iteration. If there are more deeply nested cycles, then each of those inner cycles is run to completion for everyy1 <-> y2
andy1 <-> y3
iteration, making the number of iterations dependent on the nesting of queries rather than how quickly the problem converges.To avoid this, the idea is to identify the strongly connected components in the dependency graph. These are:
x1
(no dependencies)x2
(no dependencies)y1
,y2
,y3
z1
,z2
(only depends ony1
in one direction but not both)Then, run a fixpoint per strongly connected component. That means, we only run 2 fixpoint iterations for the example above. This ensures that the runtime is bound by how quickly the problem converges and not by how deeply nested our queries are (which creates a tension between granular queries and performance).
Single-threaded implementation
Let's ignore multi-threading first as this only complicates things ;)
The change in itself isn't that complicated:
execute_maybe_iterate
: If the query is part of an outer cycle, then don't remove the cycle head fromcycle_heads
, set a flag on the memo indicating whether this head has converged, and return the provisional value. The query will iterate as part of the outer cyclememo.cycle_heads
) have convergedThat's roughly it :)
Multithreading
Where things get complicated is multithreading. I tried to keep the existing locking (where we wait on cycle heads in some places), but I wasn't able to make it work (there were spurious errors till the end). That's why I ended up rewriting cycle head synchronization from scratch.
What makes cycles tricky is that multiple threads can enter the same cycle simultaneously, each starting from a different cycle head that ultimately resolves to the same outer cycle. The new implementation makes this even trickier because inner cycles now complete immediately without iterating, which results in them releasing their locks. That allows other threads to claim those locks again, which can result in them competing for the same locks forever.
This PR extends Salsa's
DependencyGraph
with a mechanism to transfer the ownership of a query to another query. Let's say we havea -> b -> a
andb -> c -> b
. The inner cycle head can transfer its lock to the querya
.b
then remains locked untilc
completes (successfully or panicking). Only the thread owninga
(because it's the thread holding that lock or it's another thread on whicha
is blocked on) is allowed to reclaimb
.This ensures that the locks of all cycle heads participating in a query ultimately converge to be owned by a single query/thread. We need to do this for every query that supports cycle handling, even if it isn't a cycle head, because the query could become a cycle head in later iterations (e.g., due to changes in the cycle-entry point).
I also found this approach much easier to reason about than the old approach of "just wait for all cycle heads".
However, there's a downside. It requires acquiring the global
DependencyGraph
lock and tracking dependencies whenever claiming or releasing any query with cycle handle support that participates in a cycle. We incur this cost even when all queries participating in the cycle run on the same thread; this is also the main reason whyconverge-diverge
's performance regresses. I'm not sure if there's a solution to avoid this.TODO:
maybe_changed_after
cycle_nested_deep_conditional_changed::the_test
sometimes hangs or panics